Leveraging Inexpensive Supervision Signals for Visual Learning

نویسندگان

  • Senthil Purushwalkam
  • Shiva Prakash
  • Abhinav Gupta
چکیده

The success of deep learning based methods for computer vision comes at a cost. Most deep neural network models require a large corpus of annotated data for supervision. The process of obtaining such data is often time consuming and expensive. For example, the process of collecting bounding box annotations takes 26-42 seconds per box. This requirement poses a hindrance for extending these methods to novel domains. In this thesis, we explore techniques for leveraging inexpensive forms of supervision for visual learning. More specifically, we first propose an approach to learn a pose-encoding visual representation from videos of human actions without any human supervision. We show that the learned representation improves performance for pose estimation and action recognition tasks compared to randomly initialized models. Next, we propose an approach to use freely available web data and inexpensive image-level labels to learn object detectors. We show that web data, while highly noisy and biased, can be effectively used to improve localization of objects in the weak-supervision setting.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Machine learning based Visual Evoked Potential (VEP) Signals Recognition

Introduction: Visual evoked potentials contain certain diagnostic information which have proved to be of importance in the visual systems functional integrity. Due to substantial decrease of amplitude in extra macular stimulation in commonly used pattern VEPs, differentiating normal and abnormal signals can prove to be quite an obstacle. Due to developments of use of machine l...

متن کامل

Incidental Supervision: Moving beyond Supervised Learning

Machine Learning and Inference methods have become ubiquitous in our attempt to induce more abstract representations of natural language text, visual scenes, and other messy, naturally occurring data, and support decisions that depend on it. However, learning models for these tasks is difficult partly because generating the necessary supervision signals for it is costly and does not scale. This...

متن کامل

Learning Unsupervised Visual Grounding Through Semantic Self-Supervision

Localizing natural language phrases in images is a challenging problem that requires joint understanding of both the textual and visual modalities. In the unsupervised setting, lack of supervisory signals exacerbate this difficulty. In this paper, we propose a novel framework for unsupervised visual grounding which uses concept learning as a proxy task to obtain self-supervision. The simple int...

متن کامل

Visual Prediction of Rover Slip: Learning Algorithms and Field Experiments

Perception of the surrounding environment is an essential tool for intelligent navigation in any autonomous vehicle. In the context of Mars exploration, there is a strong motivation to enhance the perception of the rovers beyond geometry-based obstacle avoidance, so as to be able to predict potential interactions with the terrain. In this thesis we propose to remotely predict the amount of slip...

متن کامل

Unsupervised Learning using Sequential Verification for Action Recognition

In this paper, we present an approach for learning a visual representation from the raw spatiotemporal signals in videos. Our representation is learned without supervision from semantic labels. We formulate our method as an unsupervised sequential verification task, i.e., we determine whether a sequence of frames from a video is in the correct temporal order. With this simple task and no semant...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017